Developing Chinese TAK for Computer Directly

نویسندگان

  • Guo-Ping HU
  • Ben-Feng CHEN
  • Ren-Hua WANG
چکیده

With the development of text analysis, the quality of the computer-used knowledge is more and more crucial to the analysis accuracy, and the text analysis knowledge (TAK) has also developed by many researchers. But so far, except the lexicon, TAK for computer (such as phrase structure grammar, unregistered word recognition rule, etc) is done on a small scale. Although large scale corpus with word segmentation annotation and even treebank has been developed, all these projects contribute limitedly to the text parser compared with the huge workload of the annotation, especially in Chinese domain. Considering the disadvantages of the data-mining and training technology used in text analysis field, aiming at one TTS system, this paper demonstrates a complete set of solutions to develop Chinese TAK for computer, including lexicon tree, nesting phrase structure grammar, combination-bigram, developing flow with computer’s aid, and checking and improving the quality of the TAK automatically with the treebank (the treebank is the by-product of this development). This paper also shows that a text analysis system based on the construction result hits an accuracy rate of 80% in a close testing set of 24700 sentences, and approximately 50% tested on an open corpus. It is thus deduced that directly developing Chinese TAK for computer is more effective than other approaches under same workload.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing Universal Dependencies for Mandarin Chinese

This article proposes a Universal Dependency Annotation Scheme for Mandarin Chinese, including POS tags and dependency analysis. We identify cases of idiosyncrasy of Mandarin Chinese that are difficult to fit into the current schema which has mainly been based on the descriptions of various Indo-European languages. We discuss differences between our scheme and those of the Stanford Chinese Depe...

متن کامل

Higher Median Levels of Free -hCG and PAPP-A in the First Trimester of Pregnancy in a Chinese Ethnic Group Implication for First Trimester Combined Screening for Down’s Syndrome in the Chinese Population

Objective: To study the effect of ethnic Chinese on the medians of free -hCG and PAPP-A in the fi rst trimester of pregnancy. Methods: The data of 943 women undergoing fi rst trimester combined screening for fetal Down syndrome were analysed to derive the Chinese-specifi c medians. The calculated risk of Down syndrome based on these Chinese-specifi c medians was compared with that based on the ...

متن کامل

The clinical characteristics of Chinese Takayasu’s arteritis patients: a retrospective study of 411 patients over 24 years

BACKGROUND We aimed to investigate the clinical characteristics of 411 Chinese Takayasu's arteritis (TAK) patients using a retrospective analysis. METHODS We retrospectively reviewed 810 medical charts of patients with a diagnosis of TAK who were admitted to Peking Union Medical College Hospital from 1990 to 2014. 411 patients with a complete dataset were finally included in the analysis. The...

متن کامل

Efficient Reverse Converter for Three Modules Set {2^n-1,2^(n+1)-1,2^n} in Multi-Part RNS

Residue Number System is a numerical system which arithmetic operations are performed parallelly. One of the main factors that affects the system’s performance is the complexity of reverse converter. It should be noted that the complexity of this part should not affect the earned speed of parallelly performed arithmetic unit. Therefore in this paper a high speed converter for moduli set {2n-1, ...

متن کامل

Efficient Reverse Converter for Three Modules Set {2^n-1,2^(n+1)-1,2^n} in Multi-Part RNS

Residue Number System is a numerical system which arithmetic operations are performed parallelly. One of the main factors that affects the system’s performance is the complexity of reverse converter. It should be noted that the complexity of this part should not affect the earned speed of parallelly performed arithmetic unit. Therefore in this paper a high speed converter for moduli set {2n-1, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002